Analysis of bacterial diversity and community structure in gastric juice of patients with advanced gastric cancer

Background The occurrence and development of gastric cancer are related to microorganisms, which can be used as potential biomarkers of gastric cancer. Objective To screen the microbiological markers of gastric cancer from the microorganisms of gastric juice. Methods Gastric juice samples were collected from 61 healthy people and 78 patients with gastric cancer (48 cases of early gastric cancer and 30 cases of advanced gastric cancer). The bacterial 16 S rRNA V1-V4 region of gastric juice samples was sequenced. The Shannon index, Simpson index, Ace index and Chao index were used to analyze the diversity of gastric juice samples. The RDP classifier Bayesian algorithm was used to analyze the community structure of 97% OTU representative sequences with similar levels. Linear discriminant analysis and ST-test were used to analyze the differences. Six machine learning algorithms, including the logistic regression algorithm, random forest algorithm, neural network algorithm, support vector machine algorithm, Catboost algorithm and gradient lifting tree algorithm, were used to construct risk prediction models for gastric cancer and advanced gastric cancer. Results The microbiota diversity and the abundance of bacteria was different in the healthy group, early gastric cancer and advanced gastric cancer (P < 0.05). The top five abundant bacteria among the three groups were Streptococcus, Rhodococcus, Prevotella, Pseudomonas and Helicobacter. Bacterial flora such as Streptococcus, Rhodococcus and Ochrobactrum were significantly different between the healthy group and the gastric cancer group. The accuracy of the random forest prediction model is the highest (82.73% correct). The bacteria with the highest predictive value included Streptococcus, Lactobacillus and Ochrobactrum. The abundance of bacteria such as Fusobacterium, Capnocytophaga, Atopobium, Corynebacterium was high in the advanced gastric cancer group. Conclusion Gastric juice bacteria can be used as potential biomarkers to predict the occurrence and development of gastric cancer. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-023-00612-7


3
that Porphyromonas, Neisseria and Streptococcus were reduced in the gastric mucosa of GC patients, and Lactobacillus coleohominis, Lachnospira Bryant and Pseudomonas aeruginosa were increased. These results suggest that these bacteria may play a role in the process from gastritis to intestinal metaplasia to GC. Yu et al. [15], using 16 S ribosomal RNA gene sequencing analysis and the PICRUSt bioinformatics software package, showed that the diversity of bacteria in the stomach may play an important role in the occurrence of gastric cardia cancer. Sung [16] showed that human gastric juice contains different microbial groups mainly composed of Fusobacteria, Actinomycetes, Bacteroidetes, Firmicutes and Proteobacteria. Therefore, the study of gastric juice microbiota can provide a theoretical basis for the mechanism of GC progression.
The composition of the microbiota in the stomach can be influenced by external factors, such as diet, proton pump inhibitors, antibiotics, etc. In vivo animal experiments have found [17] that long-term use of yogurt containing probiotics such as Bifidobacterium and Lactobacillus can reduce the inflammatory response and inhibit intestinal metaplasia induced by HP infection in Mongolian gerbils, possibly because probiotics can induce an increase in IL-10 expression and a decrease in TNF-α expression. Paroni et al. [18] used 16 S rRNA gene sequencing in patients with dyspepsia and showed a different gastric microbiota between patients treated with and without omeprazole. Rosenvinge et al. [19] described bacterial and fungal microbiota in the gastric juices of 25 patients by gene sequencing of bacterial 16 S rRNA variable regions and fungal ITS regions and concluded that antibiotics can reduce bacterial but not fungal biodiversity. Therefore, actively searching for different gastric juice microbiota in GC provides a basis for targeted treatment in the future, and it is conducive to improving the prognosis and survival rate.
In the last decade, we have witnessed a revolution in sequencing technology, which has enabled us to understand many concepts in genetics and genome biology [20]. Next-generation sequencing (NGS) technology solves the problem of detecting a large number of gene changes at one time. It can analyze point mutations, base deletion insertion mutations, gene copy number changes and gene fusion mutations at the same time and has the advantages of high accuracy, low sample demand and a short detection cycle. NGS has been demonstrated in different phase I and II trials, expanding our knowledge of the gastrointestinal microbiome. Many studies have shown that the diversity and community structure of gastrointestinal microbes change in malignant tumors such as GC [21], lung cancer [22] and intestinal tumors [23].
In this study, we collected gastric juice samples from 61 healthy people, 48 patients with early GC and 30 patients with advanced GC. We analyzed the community structure, alpha diversity, difference and correlation of gastric juice microbiota from healthy people and GC patients, early GC patients and advanced GC patients by bacterial 16 S rRNA detection. The research can provide data support for the screening of GC and the diagnosis and treatment of advanced GC by screening the bacterial flora of gastric juice associated with advanced GC.

Data Acquisition
From December 2020 to August 2021, 48 patients with TNM stage I-II early GC and 30 patients with TNM stage III-IV advanced GC were admitted to the Department of Gastrointestinal Surgery and Oncology of Huzhou Central Hospital, and 61 healthy controls were recruited by the Health Examination Center.
The clinical trial and informed consent of the included patients were approved by the Ethics Committee of Huzhou Central Hospital (No. 20201205-02). The raw sequencing data have been deposited into the NCBI Sequence Read Archive (SRA) database under the accession number PRJNA890002 and PRJNA890143. The general condition of the patients was described in Additional file 1: Tables S1 and Additional file 2: Table S2.
Inclusion criteria for GC patients: (1) patients diagnosed with GC by pathology, and the clinical staging was according to the American Joint Cancer Committee (AJCC) cancer staging guidelines; (2) no surgery, chemotherapy, radiotherapy and other treatments; (3) patients or agent sign the informed consent.
Inclusion criteria for healthy controls: The healthy control group had no respiratory diseases, gastrointestinal diseases, oral diseases, malignant tumors or tumor-related symptoms in the past two years.
The exclusion criteria were as follows: (1) complications with other malignant tumors; (2) complications with serious cardiopulmonary diseases; (3) inability to undergo gastroscopy; (4) history of antibiotic, hormone or gastrointestinal microbiotic use 3 months before admission; and (5) chronic gastric ulcer, gastritis and other stomach diseases.

Collection and processing of microbial specimens in gastric juice
Collection and processing of gastric juice samples: subjects were instructed to abstain from food and drink for more than 12 h before collection, and gastric contents were emptied. The next morning, 50 ml of sterilized water for injection was injected into the stomach through a gastroscope, 50 ml of gastric juice samples were extracted and labeled on the container for examination, and the bacterial flora of gastric juice was analyzed.

MiSeq sequencing of the microbial genome
(1) Genomic DNA extraction: Bacterial DNA was extracted from microbial samples of gastric juice using a bacterial DNA extraction kit. A NanoDrop2000 was used for DNA purity analysis. After selecting qualified samples, 16 S rRNA sequencing was commissioned by a professional company. (2) PCR amplification: Specific primers with barcodes were synthesized. The 16S rDNA primer sequences for the V1-V4 regions were 357F: 5ʹ-TAC GGG AGG CAG CAG-3ʹ and 1114R: 5ʹ-GCA ACG AGC GCA ACCC-3ʹ. To ensure the accuracy and reliability of subsequent data analysis, two conditions should be met: (1) use low-cycle number amplification as far as possible; (2) ensure that the cycle number of amplification for each sample is consistent. A representative sample was randomly selected for the pre-experiment to ensure that the majority of samples could amplify the product at the right concentration at the lowest number of cycles. Polymerase chain reaction (PCR) products of the same sample were mixed and detected by 2% agarose gel electrophoresis. PCR products were recovered by gluing using an AxyPrepDNA gel recovery kit (AXYGEN company) and eluted with Tris_HCl. Electrophoresis was performed on 2% agarose. The PCR products were quantified with the QuantiFluor ™ -ST blue fluorescence Quantification system (Promega company) based on the preliminary results of electrophoresis and then mixed in proportion to the sequencing volume required for each sample. (3) MiSeq library construction: connecting "Y" joint; Magnetic bead screening was used to remove the self-connected segment of the joint; The library template was enriched by PCR amplification. Sodium hydroxide denatures, producing single-stranded DNA fragments. (4) MiSeq sequencing: one end of the DNA fragment is complementary to the primer base and fixed on the chip; the other end is randomly complementary to another primer nearby and is also fixed to form a "bridge". PCR amplification was performed to generate DNA clusters. The DNA amplicon was linearized into a single strand. The modified DNA polymerase and four fluorescently labeled dNTPs were added to synthesize only one base per cycle. The surface of the reaction plate was scanned by laser to read the type of nucleotide that was aggregated in the first round of reaction of each template sequence. The "fluorophore" and the "termination group" were chemically cleaved to restore the viscosity of the 3' end and continue to polymerize the second nucleotide. The fluorescence signal results collected in each round were counted to obtain the sequence of template DNA fragments.

Bioinformatics analysis
OTU clustering analysis: The uparse (version 7.1) method was used for OTU clustering. The sequence similarity in OTUs was set to 97%, and the representative sequence of OTUs was obtained. uchime (version 4.2.40) was used to detect the chimeric sequences generated in PCR amplification and remove them from OTUs. The Usearch_global method was used to compare the map of optimized sequences back to the representative sequences of OTUs, and the sequence abundance tables of each OTU sample were obtained. Diversity analysis: To study the microbial diversity of the fecal microbial community ecology of the sample, the diversity analysis of a single sample (Alpha diversity) can reflect the abundance and diversity of the microbial community, including a series of statistical analysis indices to estimate the species abundance and diversity of the environmental community. Mothur software (https:// www. mothur. org/ wiki/ Downl oad_ mothur) was used to calculate the Chao abundance index and Ace index assessment flora. The Shannon index and Simpson index were calculated to evaluate the diversity of the flora.
Community structure analysis: The RDP classifier Bayesian algorithm was used to perform taxonomic analysis on 97% OTU representative sequences with similar levels, and the community composition of each sample was counted at each level (phylum, class, order, family, genus, species). Variance decomposition was used to reflect the differences of multiple groups of data on the two-dimensional coordinate map, and the two eigenvalues of the coordinate axes that could reflect the maximum variance value were used for PCA. The position of samples in each dimension was recorded, the contribution of each OTU to each principal component was calculated, and PCA statistical analysis and PCA diagram were performed using R language. PCoA first sorts a series of eigenvalues and eigenvectors, then selects the most important eigenvalues in the first few, displays them in the coordinate system, and uses R language for PCoA statistical analysis and mapping. Finally, Excel was used to draw a percentage stacked bar chart.
LEfSe multilevel discriminant analysis of species differences: LEfSe has a powerful identification function through biological significant differences. It then performs additional tests to assess whether these differences match the expected biological behavior. First, the nonparametric factorial Kruskal-Wallis (KW) sum-rank test detects the characteristics of significant differences in abundance and finds the taxa that are significantly different from their abundance. Finally, LEfSe linear discriminant analysis (LDA) was used to estimate the magnitude of the effect of the abundance of each component (species) on the differential effect. Build machine learning model: the differences in gastric flora for building elements, integrated application of intelligent colorectal cancer screening system was used to construct a prediction model for advanced GC by logistic regression (LR), random forest (RF), neural network (NN), support vector machine (SVM), CatBoost, and gradient boosted decision tree (GBDT).

Statistical analysis
Statistical analysis was performed using SPSS V25.0 (SPSS Inc., Chicago, IL), For continuous variables, independent sample t test was used for single factor analysis between the two groups, and chi-square test was used for categorical variables. GraphPad Prism version 8.0 (San Diego, CA) and the Tutools platform (http:// www. cloud tutu. com) were used for the preparation of graphs.

Descriptive analysis of Bacteria from healthy controls and GC patients
By comparing the bacterial diversity and community of the gastric juice of the healthy group and GC group, it was found that there was no difference in the diversity of bacteria between the two groups at the genus level (P > 0.05) (Fig. 1A-B). However, there were differences in the abundance of bacterial flora between the two groups (P < 0.05) (Fig. 1C-D). The sequencing depth was shown in Additional file 3: Table S3. The bacterial community structure of gastric juice was different between the two groups. Streptococcus was widely distributed in the GC group, and Rhodococcus was widely distributed in the healthy group (Fig. 1E). The top five bacteria with the highest composition ratio in the two groups were Streptococcus, Rhodococcus, Prevotella, Pseudomonas and Helicobacter, and Streptococcus and Helicobacter accounted for more in the GC group. Rhodococcus, Pseudomonas and Ochrobatrum accounted for more in the healthy group (Fig. 1F). A Venn diagram showed that there were 886 common bacteria in the gastric juice of the two groups, 461 unique bacteria in the healthy group and 97 unique bacteria in the GC group (Fig. 1G).

Differential bacteria between healthy controls and GC patients
The differences in bacteria in the gastric juice of the two groups were analyzed, and 15 bacteria, including Streptococcus, Rhodococcus and Ochrobactrum, were screened out ( Fig. 2A). LEfSe analysis was used to compare the different bacteria between the two groups. The results showed that the characteristic bacteria of the healthy group were 112 species, such as Rhodococcus, Ochrobactrum and Pseudomonas, indicating that these bacteria were more important Simpson index. C Ace index. D Chao index. * represents a significant difference between the two groups (p < 0.05). E Community composition of the bacterial community. The ordinate is the name of the sample, and the abscissa is the proportion of bacteria in the sample. Different colors of the column represent different species, and the length of the column represents the size of the proportion of the species. F A histogram of percentage accumulation drawn for the top 30 bacteria with the highest abundance in the two groups. G Venn diagram. Red is GC, blue is healthy, and the number of nonoverlapping species represents the number of species unique to the corresponding group in the healthy group but less important in the GC group. The characteristic bacteria of the GC group were 40 species, such as Streptococcus, Veillonella and Lactobacillus (Fig. 2B-C).

Correlation of differentially abundant bacteria between healthy people and GC patients
To further clarify the relationship between different bacteria in GC, the correlation between different bacteria was analyzed. The significantly correlated differences in the healthy group were Ochrobactrum and Rhodococcus (r = 0.916, P < 0.001) Paracoccus and JG30 − KF − CM45 (r = 0.987, P < 0.001) (Fig. 3A); The significantly associated differentially abundant bacteria in the GC group were Ochrobactrum and Paracoccus (r = 0.650, P < 0.001), Ochrobactrum and Rhodococcus (r = 0.901, P < 0.001), Paracoccus and Rhodococcus (r = 0.537, P < 0.001) (Fig. 3B). Bacteria such as Streptococcus and Helicobacter were more closely related to GC, and bacteria such as Rhodococcus and Pseudomonas were more closely related to healthy people (Fig. 3C).

Construction of a prediction model for GC
Six machine learning algorithms were used to construct a logistic regression model, random forest model, neural network model, support vector machine model, gradient lifting tree model, and CatBoost model. Figure 3 Correlation analysis of different bacteria within and between groups. The numerical matrix of the two groups of different bacteria is visually displayed through the heatmap. The color change reflects the data information, and the color depth represents the correlation. The redder the color is, the higher the correlation between the two bacteria. A Intragroup bacterial correlation heatmap of the healthy group. The Pearson coefficient was used to calculate the correlation between the bacteria. The shade of color indicates the size of the data value. Pearson correlation coefficients are indicated in the figure. *0.01 < p < 0.05; **0.001 < p ≤ 0.01; ***p ≤ 0.001. B Intragroup bacterial correlation heatmap of the GC group. C Chord diagram. One side of the circle is the species name, and the other side is the sample name, which is represented by different colors. The species abundance is displayed as a percentage

Differential bacteria in the healthy group, early GC and advanced GC
The differences in bacteria in gastric juice among the three groups were analyzed. 15 differential bacteria such as Streptococcus, Rhodococcus, Ochrobactrum, Fusobacterium, and Helicobacter were screened out (Fig. 6 A). LEfSe analysis was used to compare the different bacteria among the three groups, and the results showed that the characteristic bacteria of the early GC group were Staphylococcales, Gemella, Gemellaceae. These results indicated that these bacteria were more important in the early GC group. The characteristic bacteria of the advanced GC group were Lactobacillus, Lactobacillaceae, Slackia. (Fig. 6B C).

Correlation of differentially abundant bacteria in healthy group, early GC and advanced GC
To further clarify the relationship between different bacteria in advanced GC, the correlation between different bacteria was analyzed. The significantly associated differentially abundant bacteria in the healthy group were Ochrobactrum and Rhodococcus (r = 0.916, P < 0.001) (Fig. 7A); The significantly associated differentially abundant bacteria in the early GC group were Ochrobactrum and Rhodococcus (r = 0.926, P < 0.001) (Fig. 7B); The significantly associated differentially abundant bacteria in the advanced GC group were Stenotrophomonas and Rhodococcus (r = 0.626, P < 0.001) (Fig. 7C). Prevotella was more correlated with healthy group, Helicobacter was more correlated with early GC, and Streptococcus was more correlated with advanced GC (Fig. 7D).

Discussion
The stomach plays an important role in maintaining gastrointestinal health by providing a barrier against gastrointestinal infectious disease pathogens. Patients with GC of the stomach contain a large number of microbes, and microorganisms play an important role in the occurrence and development of GC. Screening specific gastric microbiota provides a new direction for further exploring the early screening and diagnosis of GC and the treatment of advanced GC. In this study, gastric juice samples were collected from 61 healthy people, 48 patients with early GC and 30 patients with advanced GC.  5 Alpha diversity analysis and composition of bacteria in gastric juice samples from healthy group, early GC and advanced GC group. The alpha diversity of bacteria among the three groups of gastric juice samples was analyzed at the genus level, and a violin diagram was constructed. Red represents the healthy group, Blue represents the early GC group, and green represents the advanced GC group. A Shannon index. B Simpson index. C Ace index. D Chao index. E Coverage index. * represents a significant difference in the three groups (p < 0.05). F Community composition of the gastric juice bacterial community. The ordinate is the name of the sample, and the abscissa is the proportion of bacteria in the sample. Different colors of the column represent different species, and the length of the column represents the size of the proportion of the species. G A histogram of percentage accumulation drawn for the top 30 bacteria with the highest abundance in the three groups. H Venn diagram. Healthy group is shown in red, early gastric cancer is shown in blue, and advanced gastric cancer is shown in green, and the number of nonoverlapping species represents the number of species unique to the corresponding group Fig. 6 Multispecies difference test bar chart and diagram of different microflora in the three groups. A The statistical method of Student's t test was used to test the hypothesis of the species between the microbial communities of the three groups of samples and evaluate the significance level of the difference in species abundance. P < 0.05 indicates a significant difference. The closer the line is to the middle, the smaller the standard deviation and the better the central tendency. B The LDA score was obtained by linear regression analysis, and the greater the LDA score was, the greater the impact of bacterial abundance on the difference effect. An LDA score of more than 2 indicates a statistically significant difference (P < 0.05). C The graph shows LEfSe multistage species from the inner to the outer circle and represents the phylum, class, order, family, genus, and species of different unit levels. Different color nodes indicate the microbial groups that were significantly enriched in the corresponding groups and had a significant influence on the differences. The paleyellow nodes indicate the microbial groups that had no significant difference among different groups and Bacteroides were significantly more enriched in GC samples. As the disease progresses to more severe stages, the advantages of HP begin to be replaced by other bacteria, including Streptococcus, Prevotella, Achromobacter, Citrobacter, Clostridium, Rhodococcus, lactic acid bacteria and Phyllobacter [28]. In the study, the abundance of Helicobacter and Streptococcus were highest in early GC and lowest in healthy group. But Dai D, et al. [24] found that the abundance of HP increased in non-tumor tissues, while the abundance of Streptococcus, Bacteroides, Prevotella increased in tumor tissues. Dicksved et al. [29] conducted gene sequence analysis on the adjacent gastric mucosa tissues of 10 patients with GC and the gastric mucosa tissues of 5 patients with dyspepsia and found that the number of HP in the gastric mucosa of GC patients was lower, and the main bacteria were Streptococcus and Prevotella. Liu et al. [30] found that Prevotella melaninogenica and Streptococcus anginosus were increased in the tumoral microhabitat, and HP, Prevotella copri and Bacteroides uniformis were significantly decreased. It was suggested that the differences in metabolome profiles between GC tumors and matched non-tumor tissues may be partly due to the collective activities of HP, Lactobacillus and other bacteria, which ultimately affect the occurrence and progression of GC. Hu et al. [31] showed that compared with superficial gastritis, the microbiome of GC was characterized by the enrichment of a variety of bacterial genera and species including genera Neisseria, Alloprevotella, and Aggregatibacter, species Streptococcus_mitis_oralis_pneumoniae. The study found that Neisseria, Alloprevotella and Aggregatibacter accounted for the highest proportion in advanced GC patients. The results showed that Streptococcus was the bacterium with the highest proportion in early GC and advanced GC, and this was consistent with the findings of Zhou et al. [32]. But Aviles-Jimenez [14] found that the amount of Streptococci in the GC patients decreased, and the diversity of bacteria gradually decreased during the transition from nonatrophic gastritis to intestinal metaplasia and intestinal GC. In the present study, the abundance of Lactobacillus was increased in the advanced GC group and Fusobacterium was abundant in advanced GC and healthy group But Fusobacterium nucleatum has been found in GC [33,34], and the presence of Fusobacterium nucleatum in tumors was thought to be associated with poor survival [35].
In conclusion, Gastric microbiome plays an important role in the gastric carcinogenesis. The community structure and diversity of gastric microbiota in patients with advanced GC have changed, and the gastric microbiota can be used to distinguish early and advanced GC patients.
Bacteria can promote anti-tumor immune responses through a variety of mechanisms, such as triggering T cell responses to bacterial antigens, or inducing tumor-specific antigen recognition through small metabolites mediating systemic effects on the host [36]. HP infection induced nonspecific inhibition of circulating T cells [37]. This group of bacteria, particularly lactic acid bacteria, Streptococcus increased the number of CD3 + T, CD4 + T, and natural killer cells [38]. The study found that Lactobacillus differed among the three groups. Lactic acid can promote chemotherapy resistance [39] and promote tumor growth [40]. Lactic acid bacteria such as Streptococcus, Lactobacillus and Lactococcus have been linked to the progression of GC. They can influence the development of GC in a variety of ways, including increasing N-nitroso compounds causing DNA damage and regulating the expression of key molecules important in cancer development [41]. They found that patients with GC exhibit a genotoxic potential for abiotic microbial communities. Microbial communities found in GC showed increased nitrosation function, consistent with increased genotoxic potential [25]. The study found Veillonella was relatively higher in the GC group. Studies have found that the presence of Veillonella was related to the tumor response to nivolumab [5]. A unique group of bacteria, including Streptococcus, Parvoomonas, Prevotella, Roche and Bacillus, were potential therapeutic targets for the prevention of GC [42]. Recent consensus and meta-analysis reports indicate that eradication of HP reduces the incidence of GC by a factor of 0.55 [43]. It follows that changes in the gastric microbiome may need to be considered to improve the therapeutic effect.
Some scholars have found that Streptococcus anginosus and Streptococcus constellatus are enriched in the feces of precancerous and early GC patients, and therefore they can be used as a means for early screening of GC. However, the collection method of fecal microorganisms does not take into account the influence of the upper digestive tract. In this study, gastric juice microbiota was studied in healthy people and GC patients (early and advanced GC) to provide a reference for the method of obtaining gastric juice samples. Because the difficulty of collecting specimens of gastric juice is easily affected by many factors, such as the intubation pathway, the depth of the tube, client postures, and the absorbing method, the study of gastric juice sample collection and processing of the standardization process has been established, and a standard gut microbe noninvasive test method that is simple and convenient to use to determine the potential of GC occurrence mechanism provides a new direction. This study explored new methods of human digestive tract bacteromics from the perspective of microorganisms, screened harmful bacteria associated with GC progression, and provided data support for the study of the mechanism of GC progression.
The gut microbiota is an exploratory biomarker for GC immunotherapy, searching for the microbiota that can inhibit the progression of GC and providing a research direction for the microbial treatment of advanced GC. Analyzing the microbial community composition and structure in the gastric juice of healthy people, early GC patients and patients with advanced GC can provide data support for the screening of early GC and the diagnosis and treatment of advanced GC.
There are some limitations in this study. The sample size in this study was insufficient, and there may be false negative results. However, this study provides ideas and references for subsequent multicenter and large-scale studies to elucidate the relationship between the occurrence and development of GC and gastric juice microbes. Although potential procancer bacteria and anticancer bacteria have been screened for GC, the specific mechanism of harmful bacteria, such as Streptococcus, leading to advanced GC is unclear and requires further molecular experiments for verification. Further research is needed to determine whether probiotics such as Lactobacillus can be used as adjuvant drugs in the treatment of GC.